Search CORE

13 research outputs found

ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models

Author: Lee Youhan
Yu Hasun
Publication venue
Publication date: 29/03/2023
Field of study

Protein language models (pLMs), pre-trained via causal language modeling on protein sequences, have been a promising tool for protein sequence design. In real-world protein engineering, there are many cases where the amino acids in the middle of a protein sequence are optimized while maintaining other residues. Unfortunately, because of the left-to-right nature of pLMs, existing pLMs modify suffix residues by prompting prefix residues, which are insufficient for the infilling task that considers the whole surrounding context. To find the more effective pLMs for protein engineering, we design a new benchmark, Secondary structureE InFilling rEcoveRy, SEIFER, which approximates infilling sequence design scenarios. With the evaluation of existing models on the benchmark, we reveal the weakness of existing language models and show that language models trained via fill-in-middle transformation, called ProtFIM, are more appropriate for protein engineering. Also, we prove that ProtFIM generates protein sequences with decent protein representations through exhaustive experiments and visualizations.Comment: Preprin

arXiv.org e-Print Archive

Solvent: A Framework for Protein Folding

Author: Han Kyeongtak
Kim Jaehoon
Lee Jaemyung
Lee Youhan
Yu Hasun
Publication venue
Publication date: 19/07/2023
Field of study

Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, an protein folding framework that supports significant components of state-of-the-art models in the manner of off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and gives efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.Comment: preprint, 8page

arXiv.org e-Print Archive

A community-powered search of machine learning strategy space to find NMR property prediction models

Author: Anderson Brandon
Bai Shaojie
Bratholm Lars A.
Butts Craig P.
Choi Sunghwan
Dang Lam
Gerrard Will
Glowacki David R.
Hanchar Pavel
Howard Addison
Huard Guillaume
Kim Sanghoon
Kolter Zico
Kondor Risi
Kornbluth Mordechai
Lee Youhan
Lee Youngsoo
Mailoa Jonathan P.
Nguyen Thanh Tu
participants Kaggle
Popovic Milos
Rakocevic Goran
Reade Walter
Song Wonho
Stojanovic Luka
Thiede Erik H.
Tijanic Nebojsa
Torrubia Andres
Willmott Devin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 13/08/2020
Field of study

The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties

arXiv.org e-Print Archive

Explore Bristol Research

Deep learning models for predicting RNA degradation via dual crowdsourcing

Author: Amer Karim
Chiu King Yuen
Das Rhiju
Demkin Maggie
Fares Mohamed
Fujikawa Kazuki
Gao Jiayang
He Shujun
Ishi Keiichiro
Ito Takuya
Kim Do Soon
Kladwang Wipapat
Lee Youhan
Mao Hanfei
Nicol John J.
Noumi Taiga
Onodera Kazuki
Reade Walter
Romano Jonathan
Steenwinckel Bram
Tinti Michele
Tunguz Bojan
Vandewiele Gilles
Watkins Andrew M.
Wayment-Steele Hannah K.
Wellington-Oguri Roger
Öztürk Emin
Öztürk Fatih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition (‘Stanford OpenVaccine’) on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102–130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504–1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales

PubMed Central

University of Dundee Online Publications

Deep learning models for predicting RNA degradation via dual crowdsourcing

Messenger RNA-based medicines hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ("Stanford OpenVaccine") on Kaggle, involving single-nucleotide resolution measurements on 6043 102-130-nucleotide diverse RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1588 nucleotides) with improved accuracy compared to previously published models. Top teams integrated natural language processing architectures and data augmentation techniques with predictions from previous dynamic programming models for RNA secondary structure. These results indicate that such models are capable of representing in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for data set creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales

arXiv.org e-Print Archive

PubMed Central

University of Dundee Online Publications

Ldlr 유전자가 제거된 마우스 모델에서 RELM-α 의한 당뇨성 동맥경화 감소효과

Author: Lee YouHan
Publication venue: 서울대학교 대학원
Publication date: 01/02/2017
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 수의학과, 2017. 2. 이항.Resistin-like molecule (RELM)-α belongs to a family of secreted mammalian proteins that have putative immunomodulatory functions. Recent studies have identified a role of RELM-α in the pathogenesis of hyperlipidemia-induced atherosclerosis. However, whether RELM-α regulates diabetic atherosclerosis is unknown. Here we report that RELM-α has anti-atherogenic effects and protects against diabetic atherosclerosis in low-density lipoprotein receptor-deficient mice (LDLR -/-). Severity of the induced diabetic state was confirmed by monitoring of blood glucose levels and body weight. RELM-α overexpression appears to have a cholesterol-lowering effect. In particular, there was significant difference in cholesterol levels of diabetic group. After 8 weeks on a High-fat diet (HFD), total en face aortic lesion area was reduced in RELM-α overexpressing (RELM-α Tg) mice compared with control mice in both non-diabetic and diabetic group. Plaque area in the aortic arch was also decreased in RELM-α Tg of both groups. We show RELM-α overexpression has a higher anti-atherogenic effect with decrease of cholesterol in diabetic atherosclerosis compared with non-diabetic group. These findings define RELM-α as a novel therapeutic target for treating diabetic atherosclerosis.Introduction 1 Materials and Methods 8 1. Animal Studies and Diet 8 2. Genotyping 8 3. Antibodies 9 4. Immunoblotting 10 5. Streptozotocin Induced Diabetic Model and Mice Monitoring 11 6. Blood Analysis 12 7. Assessment of Atherosclerosis 12 8. Statistical Analysis 14 Results 15 1. The mice model of RELM-α overexpression 15 2. RELM-α overexpression reduces cholesterol in diabetic atherosclerosis mice 16 3. RELM-α overexpression reduces aortic arch plaque size 17 4. RELM-α overexpression decreases aortic root plaque size 18 List of Table 19 List of Figure 20 Discussion 30 References 38 Abstract in Korean 42Maste

SNU Open Repository and Archive

ULTRASOUND-INDUCED CHANGES IN DEPOLARIZATION OF NEONATAL VENTRICULAR CARDIOMYOCYTES

Author: Andrew Kohut
Chris Bawiec
Natasha Mehta
Peter A. Lewin
Randall A. Lee
Steven Kutalek
Youhan Sunny
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Highly Enhanced Gas Adsorption Properties in Vertically Aligned MoS 2

Author: Hae-Wook Yoo
Hee-Tae Jung
Jihan Kim
Jong-Seon Kim
Seon Joon Kim
Soo-Yeon Cho
Woo-Bin Jung
Youhan Lee
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref